首页> 外文OA文献 >Non-parametric Bayesian modelling of digital gene expression data
【2h】

Non-parametric Bayesian modelling of digital gene expression data

机译:数字基因表达数据的非参数贝叶斯建模

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Next-generation sequencing technologies provide a revolutionary tool forgenerating gene expression data. Starting with a fixed RNA sample, theyconstruct a library of millions of differentially abundant short sequence tagsor "reads", which constitute a fundamentally discrete measure of the level ofgene expression. A common limitation in experiments using these technologies isthe low number or even absence of biological replicates, which complicates thestatistical analysis of digital gene expression data. Analysis of this type ofdata has often been based on modified tests originally devised for analysingmicroarrays; both these and even de novo methods for the analysis of RNA-seqdata are plagued by the common problem of low replication. We propose a novel,non-parametric Bayesian approach for the analysis of digital gene expressiondata. We begin with a hierarchical model for modelling over-dispersed countdata and a blocked Gibbs sampling algorithm for inferring the posteriordistribution of model parameters conditional on these counts. The algorithmcompensates for the problem of low numbers of biological replicates byclustering together genes with tag counts that are likely sampled from a commondistribution and using this augmented sample for estimating the parameters ofthis distribution. The number of clusters is not decided a priori, but it isinferred along with the remaining model parameters. We demonstrate the abilityof this approach to model biological data with high fidelity by applying thealgorithm on a public dataset obtained from cancerous and non-cancerous neuraltissues.
机译:下一代测序技术为生成基因表达数据提供了革命性的工具。从固定的RNA样品开始,它们构建了数百万个差异丰富的短序列标签或“读物”的库,它们构成了基因表达水平的根本上离散的度量。使用这些技术的实验中常见的局限性是生物重复数很少甚至没有,这使数字基因表达数据的统计分析变得复杂。这类数据的分析通常基于最初为分析微阵列而设计的改良测试。这些乃至从头开始的用于分析RNA序列数据的方法都受到复制率低的普遍问题的困扰。我们提出了一种新颖的,非参数贝叶斯方法来分析数字基因表达数据。我们从用于建模过度分散的计数数据的分层模型和用于推断以这些计数为条件的模型参数的后验分布的阻塞Gibbs采样算法开始。该算法通过将基因和可能从共同分布中采样的标签计数聚在一起,并使用这种扩增后的样本来估计该分布的参数,从而弥补了生物复制数量少的问题。聚类的数量不是先验确定的,而是与其余模型参数一起推断的。我们通过在从癌性和非癌性神经组织获得的公共数据集上应用算法,证明了这种方法能够以高保真度对生物学数据进行建模的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号